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METHOD AND APPARATUS FOR SCENE- 
BASED VIDEO WATERMARKING 
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erence. Co-filed applications entitled "Method and Appara- 
tus for Embedding Data, Including Watermarks, in Human 
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Pat. No. 6,061,793, "Method and Apparatus for Embedding 
Data, Including Watermarks, in Human Perceptible 
Images," Appl. Ser. No. 08/918,122, now U.S. Pat. No. 
6,031,914 and "Method and Apparatus for Video 
Watermarking," Appl. Ser. No. 08/918,125, and "Digital 
Watermarking to Resolve Multiple Claims of Ownership," 
Appl. Ser. No. 08/918,126 are also hereby incorporated by 
reference. 

STATEMENT REGARDING GOVERNMENT 
RIGHTS 

The present invention was made with government support 
by AFOSR under grant AF/F49620-94-1-0461, NSF grant 
INT-9406954, and AF/F49620-93-1-0558. The Government 
has certain rights in this invention. 

FIELD OF THE INVENTION 

This invention relates generally to techniques for embed- 
ding data such as watermarks, signatures and captions in 
digital data, and more particularly to scene-based water- 
marks in digital data that relates to video. 

BACKGROUND OF THE INVENTION 

Digital video is readily reproduced and distributed over 
information networks. However, these attractive properties 
lead to problems enforcing copyright protection. As a result, 
creators and distributors of digital video are hesitant to 
provide access to their digital intellectual property. Digital 
watermarking has been proposed as a means to identify the 
owner and distribution path of digital data. Digital water- 
marks address this issue by embedding owner identification 
directly into the digital data itself. The information is 
embedded by making small modifications to the pixels in 
each video frame. When the ownership of a video is in 
question, the information can be extracted to completely 
characterize the owner or distributor of the data. 

Video watermarking introduces issues that generally do 
not have a counterpart in images and audio. Video signals 
are highly redundant by nature, with many frames visually 
similar to each other. Due to large amounts of data and 
inherent redundancy between frames, video signals are 
highly susceptible to pirate attacks, including frame 
averaging, frame dropping, interpolation, statistical analysis, 
etc. Many of these attacks may be accomplished with little 
damage to the video signal. A video watermark must handle 
such attacks. Furthermore, it should identify any image 
created from one or more frames in the video. 

Furthermore, to be useful, a watermark must be percep- 
tually invisible, statistically undetectable, robust to distor- 
tions applied to the host video, and able to resolve multiple 
ownership claims. Some watermarking techniques modify 
spatial/temporal data samples, while others modify trans- 
form coefficients. A particular problem afflicting all prior art 



techniques, however, is the resolution of rightful ownership 
of digital data when multiple ownership claims are made, 
i.e., the deadlock problem. Watermarking schemes that do 
not use the original data set to detect the watermark are most 
5 vulnerable to deadlock. A pirate simply adds his or her 
watermark to the watermarked data. It is then impossible to 
establish who watermarked the data first. 

Watermarking procedures that require the original data set 
for watermark detection also suffer from deadlocks. In such 
10 schemes, a party other than the owner may counterfeit a 
watermark by "subtracting off" a second watermark from the 
publicly available data and claim the result to be his or her 
original. This second watermark allows the pirate to claim 
copyright ownership since he or she can show that both the 
15 publicly available data and the original of the rightful owner 
contain a copy of their counterfeit watermark. 

There is a need, therefore, for watermarking procedures 
applicable to video digital data that do not suffer from the 
described shortcomings, disadvantages and problems. 
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SUMMARY OF THE INVENTION 



The above -identified shortcomings, disadvantages and 
problems found within the prior art are addressed by the 
present invention, which will be understood by reading and 
25 studying the following specification. The invention provides 
for the scene-based watermarking of video data. 

In one embodiment of the invention, scenes are extracted 
from video host data that is made up of a number of 
successive frames. Each scene thus includes a number of 
frames. Each frame undergoes a wavelet transformation, 
which is then segmented into blocks. A frequency mask is 
applied to the corresponding frequency-domain blocks, 
which is then weighted with the author signature, also in the 
frequency domain. The resulting weighted block is taken out 
of the frequency domain, and then weighted with the spatial 
mask for its corresponding wavelet transformed block. A 
unique watermark generation routine is also described that 
assists in the resolution of deadlock. 
4Q The approach of the invention provides advantages over 
the approaches found in the prior art. In the prior art, an 
independent watermark applied to each frame may result in 
detection of the watermark by statistically comparing or 
averaging similar regions and objects in successive video 
45 frames, as has been described in the background. However, 
the inventive scene-based approach addresses this issue by 
embedding a watermark this is a composite of static and 
dynamic components, the dynamic components preventing 
detection by statistical comparison across frames. Therefore, 
5Q statistical comparison or averaging does not yield the water- 
mark. 

Further aspects, advantages and embodiments of the 
invention will become apparent by reference to the 
drawings, and by reading the following detailed description. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



FIG. 1 is a flowchart of a method of a video watermarking 
process according to an embodiment of the invention; 

FIG. 2 is a flowchart of a method of an object-based video 
60 watermarking process according to an embodiment of the 
invention; 

FIG. 3 is a diagram of a typical computer to be used with 
embodiments of the invention; 

FIG. 4 is a block diagram of a specific implementation of 
65 scene-based video watermarking, based on the methods of 
FIG. 1 and FIG. 2, according to an embodiment of the 
invention; 
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FIG. 5 is a diagram showing a masking weighting func- 
tion k(f) according to one embodiment of the invention; and, 

FIG. 6 is a diagram showing a two-band perfect recon- 
struction filter in accordance with which a wavelet transform 
can be computed according to one embodiment of the 
invention. 

DETAILED DESCRIPTION OF THE 
INVENTION 

In the following detailed description of the preferred 
embodiments, reference is made to the accompanying draw- 
ings which form a part hereof, and in which is shown by way 
of illustration specific preferred embodiments in which the 
invention may be practiced. These embodiments are 
described in sufficient detail to enable those skilled in the art 
to practice the invention, and it is to be understood that other 
embodiments may be utilized and that logical, mechanical 
and electrical changes may be made without departing from 
the spirit and scope of the present invention. The following 
detailed description is, therefore, not to be taken in a limiting 
sense. 

Overview of the Watermarking Process 

Referring to FIG. 1, a flowchart of a method of a video 
watermarking process, according to one embodiment of the 
invention, is shown. Specifically, the method of FIG. 1 
imbeds watermark data into host video data. In step 10, the 
watermark data is generated, which is the signature, or 
watermark, that acts as a unique identifier for the host video 
data. Note that the signature inherently is spread across the 
frequency spectrum without explicit spread-spectrum pro- 
cessing. 

In one embodiment of the invention, the signature is a 
pseudo -random sequence, which is created using a pseudo- 
random generator and two keys. With the two proper keys, 
the watermark may be extracted. Without the two keys, the 
data hidden in the video is statistically invisible and impos- 
sible to recover. Pseudo-random generators are well within 
the art. For example, the reference R. Rivest, 
"Cryptography," in Handbook of Theoretical Computer Sci- 
ence (J. van Leeuwen, ed.), vol. 1, ch. 13, pp. 717-755, 
Cambridge, Mass.: MIT Press, 1990, which is hereby incor- 
porated by reference, describes such generators. 

In one embodiment, the creation of the watermark data in 
step 10 works as follows. The author has two random keys 
xl and x2 (i.e., seeds) from which the pseudo-random 
sequence y can be generated using a suitable cryptographic 
operator g(xl,x2), as known within the art. The noise-like 
sequence y, after some processing, is the actual watermark 
hidden into the video stream. The key xl is author depen- 
dent. The key x2 is signal dependent. In particular, xl is the 
secret key assigned to (or chosen by) the author. Key x2 is 
computed from the video signal which the author wishes to 
watermark. The signal dependent key is computed from the 
masking values of the original signal. The masking values 
give us tolerable error levels in the host video signal. The 
tolerable error levels are then hashed to a key x2. 

The operator g( ) is called a pseudo-random sequence 
generator. For the pseudo-random generator to be useful, a 
pirate must not be able to predict bits of y or infer the keys 
xl or x2 from knowledge of some bits of y. There are several 
popular generators that satisfy these properties, including 
RSA, Rabin, Blum/Micali, and Blum/Blum/Shub, as known 
within the art. For example, the Blum/Blum/Shub pseudo- 
random generator uses the one way function y«g(x)-»x*x 
mod n, where n=pq for primes p and q so that p»q=3mod4. 
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It can be shown that generating x or y from partial knowl- 
edge of y is computationally infeasible for the Blum/Blum/ 
Shub generator. The classical maximal length pseudo noise 
sequence (i.e., m-sequence) generated by linear feedback 

5 shift registers are not used for this purpose. Sequences 
generated by shift registers are cryptographically insecure, 
as one can solve for the feedback pattern (i.e., the keys) 
given a small number of output bits y. 

Thus, a pirate is not free to subtract off a second water- 

30 mark y f arbitrarily. The pirate must supply the keys xl' and 
x2' which generate the watermark y' they wish to embed. It 
is computationally infeasible to invert the one-way function 
y'-g(xl',x2') to obtain xl' and x2'. Furthermore, x2' is not 
arbitrary. It is computed directly from the original video 
signal, which is inaccessible to the pirate. As a result, the 
two-key pseudo-random sequence author representation 
resolves the deadlock problem. 

In step 11, a wavelet transform is applied, along the 
temporal axis of the video host data, resulting in a multi- 

20 resolution temporal representation of the video. In 
particular, the representation consists of temporal lowpass 
frames and highpass frames. The lowpass frames consist of 
the static components in the video scene. The highpass 
frames capture the motion components and changing nature 

25 of the video sequence (i.e., the video host data). The 
watermark is designed and embedded in each of these 
components. The watermarks embedded in the lowpass 
frames exist throughout the entire video scene. The water- 
marks embedded in the motion frames are highly localized 

30 in time and change rapidly from frame to frame. Thus, the 
watermark is a composite of static and dynamic compo- 
nents. The combined representation overcomes drawbacks 
associated with a fixed or independent watermarking pro- 
cedure. (I.e., avoidance of watermark detection by statistical 

35 comparison between successive frames is achieved.) 

A wavelet transform can be computed using a two-band 
perfect reconstruction filter bank as shown in FIG. 6. The 
video signal is simultaneous passed through lowpass L filter 
70 and highpass H filter 72 and then decimated by 2 (as 

40 represented by elements 74 and 76 of FIG. 6) to give static 
(no motion) and dynamic (motion) components of the origi- 
nal signal. The two decimated signals may be up sampled (as 
represented by elements 78 and 80), and then passed through 
complementary filters 82 and 84 and summed as represented 

45 by element 86 to reconstruct the original signals. Wavelet 
filters are widely available within the art. For instance, the 
reference P. P. Vaidyanathan, Multirate Systems and Filter 
Banks, Englewood Cliffs, N.J.: PTR Prentice-Hall, Inc., 
1992, which is hereby incorporated by reference, describes 

50 such filters. 

Referring back to FIG. 1, in step 12, the data generated by 
step 10 is imbedded into a perceptual mask of the host video 
data as represented by the temporal wavelet transform of 
step 11. The present invention employs perceptual masking 

55 models to determine the optimal locations within host data 
in which to insert the watermark. The perceptual mask is 
specific to video host data. The mask provides for the 
watermark data generated by step 10 to be embedded with 
the host data, at places typically imperceptible to the human 

60 eye. That is, the perceptual mask exploits masking proper- 
ties of the human visual system. Step 12 embeds the 
watermark within the temporally wavelet transformed host 
data such that they will not be perceived by a human eye, as 
defined by the perceptual model. The perceptual masking of 

65 step 12 is conducted in the frequency domain. 

Thus, image masking models based on the human visual 
system (HVS) are used to ensure that the watermark embed- 
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ded into each video frame is perceptually invisible and 
robust. Visual masking refers to a situation where a signal 
raises the visual threshold for other signals around it. Mask- 
ing characteristics are used in high quality low bit rate 
coding algorithms to further reduce bit rates. The masking 
models presented here are based on image models. 

The masking models give the perceptual tolerance for 
image coefficients and transform coefficients. These mask- 
ing models are also described in the reference B. Zhu, et al., 
"Low Bit Rate Near-Transparent Image Coding," in Proc. of 
the SPIE IntT Conf. on Wavelet Apps, for Dual Use, vol. 
2491, (Orlando, Fla.), pp. 173-184, 1995, which is hereby 
incorporated by reference, and in the reference B. Zhu, et al., 
"Image Coding with Mixed Representations and Visual 
Masking," in Proc. 1995 IEEE Int'l Conf. on Acoustics, 
Speech and Signal Processing, (Detroit, Mich.), pp. 
2327-2330, 1995, which is also hereby incorporated by 
reference. The frequency masking model is based on the 
knowledge that a masking grating raises the visual threshold 
for signal gratings around the masking frequency. The model 
is based on the discrete cosine transform (DCT), expresses 
the contrast threshold at frequency f as a function of f, the 
masking frequency fm and the masking contrast cm: 

c(f, /J«c o (0'Max{l, WJ^F}, 

where co(f) is the detection threshold at frequency f. The 
mask weighting function k(f) is shown in FIG. 5. To find the 
contrast threshold c(f) at a frequency f in an image, the DCT 
is first used to transform the image into the frequency 
domain and find the contrast at each frequency. The value 
a=0.62 as determined experimentally by psycho-visual 
tests, and as described in G. E. Legge and J. M. Foley, 
"Contrast Masking in Human Vision," Journal Optics Soci- 
ety of America, vol. 70, no. 12, pp. 1458-1471 (1980), 
which is hereby incorporated by reference. Then, a summa- 
tion rule of the form 

is used to sum up the masking effects from all the masking 
signals near f. If the contrast error at f is less than c(f), the 
model predicts that the error is invisible to human eyes. 

In step 14, the host video data as subjected to a temporal 
wavelet transform in step 11, with the embedded watermark 
data from step 12 is further subjected to a non-frequency 
mask. Because the perceptual mask in step 12 is a frequency 
domain mask, a further mask is necessary to ensure that the 
embedded data remains invisible in the host video data. The 
non-frequency mask is a spatial mask. 

Frequency masking effects are localized in the frequency 
domain, while spatial masking effects are localized in the 
spatial domain. Spatial masking refers to the situation that an 
edge raises the perceptual threshold around it. Any model for 
spatial masking can be used, and such models are well 
known in the art. However, the model used in one embodi- 
ment of the invention is similar to the model described in the 
Zhu, "Low Bit Rate ..." reference previously incorporated 
by referenced, and which is itself based on a model proposed 
by Girod in "The Information Theoretical Significance of 
Spatial and Temporal Masking in Video Signals," in Pro- 
ceedings of the SPIE Human Vision, Visual Processing, and 
Digital Display, vol. 1077, pp. 178-187 (1989), which is 
also herein incorporated by reference. 

In one embodiment, the upper channel of Girod's model 
is linearized under the assumption of small perceptual errors, 
the model giving the tolerable error level for each pixel in 



the image, as those skilled in the art can appreciate. 
Furthermore, under certain simplifying assumptions 
described in the Zhu "Bit Rate ..." reference, the tolerable 
error level for a pixel p(x,y) can be obtained by first 
s computing the contrast saturation at (x,y) 
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dc xt {x, y) = d Csm = 



Z * 4 (0,0 t **,y) 
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where the weight w 4 (x,y,x',y t ) is a Gaussian centered at the 
point (x,y) and T is a visual test based threshold. Once 
dc Ja Xx,y) is computed, the luminance on the retina, dl^,, is 
!5 obtained from the equation 

From dl„,„ the tolerable error level ds(x,y) for the pixel 
p(x,y) is computed from 

20 

The weights w 3 (x,y) and w 2 (x,y) are based on Girod's 
model. The masking model predicts that changes to pixel 
p(x,y) less than ds(x,y) introduce no perceptible distortion. 

As have been described, steps 10, 11, 12 and 14 of FIG. 
1 provide an overview of the video watermarking process of 
the present invention. An overview of the scene-based video 
watermarking process of the present invention is now 
described. 

Overview of the Scene-Based Video Watermarking 
Process 

Referring to FIG. 2, a flowchart of a method of a scene - 
35 based video watermarking process, according to one 
embodiment of the invention, is shown. The method utilizes 
the watermarking method of FIG. 1 already described. In 
step 24, a video sequence (i.e., the host video data) is broken 
(segmented) into scenes, as known within the art. For 
40 example, the reference J. Nam and A. H. Tewfik, "Combined 
Audio and Visual Streams Analysis for Video Sequence 
Segmentation," in Proceedings of the 1997 International 
Conference on Acoustics, Speech and Signal Processing, 
(Munich, Germany), pp. 2665-2668 (April 1997), which is 
45 hereby incorporated by reference, describes such scene 
segmentation. Segmentation into scenes allows the water- 
marking procedures to take into account temporal redun- 
dancy. Visually similar regions in the video sequence, e.g., 
frames from the same scene, must be embedded with a 
consistent watermark. The invention is not limited to a 
particular segmentation into scenes algorithm, however. 

In step 26, a temporal wavelet transform is applied on the 
video scenes, as has been previously described. That is, each 
scene comprises a number of frames, such that a temporal 
55 wavelet transform is applied to each frame within a scene. 
The resulting frames are known as wavelet frames. The 
multiresolution nature of the wavelet transform allows the 
watermark to exist across multiple temporal scales, resolv- 
ing pirate attacks. For example, the embedded watermark in 
60 the lowest frequency (DC) wavelet frame exists in all frames 
in the scene. 

In step 28, a watermark is embedded in each wavelet 
frame. The watermark is designed and embedded in the 
wavelet domain, such that the individual watermarks for 
65 each wavelet frame are spread out to varying levels of 
support in the temporal domain. For example, watermarks 
embedded in highpass wavelet frames are localized tempo- 
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rally. Conversely, watermarks embedded in lowpass wavelet includes keyboard 18, display device 20 and pointing device 
frames are generally located throughout the scene in the 22. Display device 20 can be any of a number of different 
temporal domain. The watermarks are embedded in accor- devices, including a cathode-ray tube (CRT), etc. Pointing 
dance with perceptual and non-frequency masks, as has been device 22 as shown in FIG. 3 is a mouse, but the invention 
described. That is, the watermarks are embedded in each 5 is not so limited. Not shown is that computer 16 typically 
frame of each scene in accordance with perceptual and also comprises a random-access memory (RAM), a read- 
spatial (non-frequency) characteristics of the frame, as has only memory (ROM), a central-processing unit (CPU), a 
been described in conjunction with the method of FIG. 1. fixed storage device such as a hard disk drive, and a 
The scene-based video watermarking method of the removable storage device such as a floppy disk drive. The 
invention has several other advantages. It is scene-based and 10 computer program to implement the present invention is 
video dependent, and directly exploits spatial masking, typically written in a language such as C, although the 
frequency masking, and temporal properties such that the present invention is not so limited. 

embedded watermark is invisible and robust. The watermark The specifics of the hardware implementation of the 

consists of static and dynamic temporal components that are invention have been described. A particular implementation 

generated from a temporal wavelet transform of the video 15 0 f the scene-based video watermarking of the invention, 

scenes. The resulting wavelet frames are modified by a based on the methods of FIG. 1 and FIG. 2, is now described, 
perceptually shaped pseudo- random sequence representing 

the author (owner). The noise-like watermark is statistically Particular Implementation of Scene^Based Video 
undetectable to thwart unauthorized removal. Furthermore, Watermarking 
the author representation resolves the deadlock problem. 20 ^ embodimen{ shown ^ nG 4 Qhlstrates a ticular 
Tne multiresolution watermark may be detected on single ^i^^ 0 f scene-based video watermarking accord- 
frames without knowledge of the location of the frames in ^ t0 tfae invention> as 5ased on the methods of FIG . x and 
the video scene. FIG. 2 that have already been described. Referring now to 

Because the video watermarking procedure is perception- piG. 4, a block diagram of this specific implementation of 

based, the watermark adapts to each individual video signal. 25 scene-based video watermarking is shown. Video frames 32 

In particular, the temporal and frequency distributions of the ( Q f v id eo host data) are denoted such that Fi is the ith frame 

watermark are controlled by the masking characteristics of m a v i deo scene> where i=0> . . . , k-1. Frames are ordered 

the host video signal. As a result, the strength of the sequentially according to time. Each frame is of size nxm. 

watermark increases and decreases with host, e.g., higher The v i deo itself may be gray scale (8 bits/pixel) or color (24 

amplitude in regions of the video with more textures, edges, 30 bits/pixel). Frames 32 undergo a temporal wavelet transfor- 

and motion. This ensures that the embedded watermark is mation 34, as has been described, to become wavelet frames 

invisible while having the maximum possible robustness. 36. The tilde representation is used to denote a wavelet 

Because the watermark representation is scene-based and representation. For example, F-i is the ith wavelet coeffi- 

multiscale, given one or more frames from a potentially cient frame. Without loss of generality, wavelet frames are 

pirated video, the watermark may be extracted from the ordered from lowest frequency to highest frequency — i.e., 

frames without knowledge ofthe location of the frame being F~0 is a DC frame. Thus, there are k wavelet coefficient 

tested. This detection characteristic exists due to the com- frames F-i, i=0, . . , , k-1. 

bined static and dynamic representation of the watermark. i n step 3g> each wave i et f rame F ~i is segmented into 8x8 
The watermark representation of the invention provides 4Q blocks B~ij, i=0, 1, . . . , (n/8) and j=0, 1, . . . , (m/8). In step 
an author representation that solves the deadlock problem. 40, each block B~ij is subjected to a discrete cosine trans- 
The author or owner of the video is represented with a form (DCT), to become block B~ij\ In step 42, a perceptual 
pseudo-random sequence created by a pseudo-random gen- frequency mask, as has been described, is applied to each 
erator and two keys. One key is author dependent, while the block to obtain the frequency mask M'ij. In step 44, author 
second key is signal dependent. The representation is able to 45 signature Yij— the watermark— also undergoes a discrete 
resolve rightful ownership in the face of multiple ownership cosine transform to become Y'ij. It should be noted that the 
claims. generation of author signature Yij is desirably in accordance 
The watermark representation of the invention also pro- with the process that has been described in conjunction with 
vides a dual watermark. The watermarking scheme uses the step 10 of FIG. 1, but the invention is not so limited, 
original video signal to detect the presence of a watermark. 50 In step 46, the mask M'ij is used to weight the noise-like 
The procedure can handle virtually all types of distortions, author Y'ij for that frame block, creating the frequency- 
including cropping, temporal resealing, frame dropping, shaped author signature P'ij-M'ijY'ij. In step 48, the spatial 
etc., using a generalized likelihood ratio test. This procedure mask S~ij is generated, as has been described, and in step 50, 
is integrated with a second watermark which does not the wavelet coefficient watermark block W~ij is obtained by 
require the original signal to address the deadlock problem. 55 computing the inverse DCT of P'ij in step 52 and locally 
As have been described, steps 24, 26, and 28 of FIG. 2 increasing the watermark to the maximum tolerable error 
provide an overview of the scene-based watermarking pro- level provided by the spatial mask S~ij. Finally, in step 54, 
cess of the present invention. The specifics of the hardware the watermark W~ij is added to the block B-ij, creating the 
implementation of the invention are now provided. watermarked block. The process is repeated for each wavelet 

6o coefficient frame F-i. 

Hardware Implementation of the Invention The watermark for each wavelet frame k the 

The present invention is not limited as to the type of block concatenation of all the watermark blocks for that 

computer on which it runs. However, a typical example of frame. The wavelet coefficient frames with the embedded 

such a computer is shown in FIG. 3. Computer 16 is a watermarks are then converted back to the temporal domain 

desktop computer, and may be of any type, including a 65 using the inverse wavelet transform. As the watermark is 

PC-compatible computer, an Apple Macintosh computer, a designed and embedded in the wavelet domain, the indi- 

UNIX-compatible computer, etc. Computer 16 usually vidual watermarks for each wavelet coefficient frame are 
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spread out to varying levels of support in the temporal 
domain. For example, watermarks embedded in highpass 
wavelet frames are localized temporally. Conversely, water- 
marks embedded in lowpass wavelet frames are generally 
located throughout the scene in the temporal domain. 5 

The watermarks embedded within the video data accord- 
ing to the method of FIG. 4 should be extractable even if 
common signal processing operations are applied to the host 
data. This is particularly true in the case of deliberate 
unauthorized attempts to remove the watermark. For 30 
example, a pirate may attempt to add noise, filter, code, 
re-scale, etc., the host data in an attempt to destroy the 
watermark. The embedded watermark, however, is noise- 
like and its location over multiplied blocks of the host data, 
over successive frames of the data, is unknown. Therefore, 15 
the pirate has insufficient knowledge to directly remove the 
watermark. Furthermore, a different signature is used for 
each block to further reduce unauthorized watermark 
removal by cross correlation. Any destruction attempts are 
done blindly. 20 

Detection of the watermark is accomplished via general- 
ized likelihood ratio test. Two methods have been developed 
to extract the potential watermark from a test video or test 
video frame. Both employ hypothesis testing. One test 
employs index knowledge during detection, i.e., the place- 25 
ment of the test video frame(s) relative to the original video 
is known. The second detection method does not require 
knowledge of the location of the test frame(s). This is 
extremely useful in a video setting, where 1000*s of frames 
may be similar, and it is uncertain where the test frames 30 
reside. 

In the first method, watermark detection with index 
knowledge, when the location of the test frame is known, a 
straightforward hypothesis test may applied. For each frame 35 
in the test video Rk, a hypothesis test is performed. 

HO: Xk=Rk-Fk=Nk (no watermark) 

HI: Xk=Rk-Fk=W*k+Nk (watermark) 
where Fk is the original frame, W*k is the (potentially 
modified) watermark recovered from the frame, and Nk is 40 
noise. The hypothesis decision is obtained by computing the 
scalar similarity between each extracted signal and original 
watermark Wk: Sk~Simk(Xk, Wk)=(Xk*Wk)/(Wk*Wk). 
The overall similarity between the extracted and original 
watermark is computed as the mean of Sk for all k: S=mean 45 
(Sk). The overall similarity is compared with a threshold to 
determine whether the test video is watermarked. The 
experimental threshold is desirably chosen around 0.1, i.e., 
a similarity value >=0.1 indicates the presence of the own- 
er's copyright. In such a case, the video is deemed the 50 
property of the author, and a copyright claim is valid. A 
similarity value <0.1 indicates the absence of a watermark. 

When the length (in terms of frames) of the test video is 
the same as the length of the original video, the hypothesis 
test is performed in the wavelet domain. A temporal wavelet 55 
transform of the test video is computed to obtain its wavelet 
coefficient frames R~k. Thus, 

HO: X~k=R~k-F~k=Nk (no watermark) 

HI: X~k=R~k-F~k=W~*k+Nk (watermark) 
where F~k are the wavelet coefficient frames from the 60 
original video, W~*k is the potentially modified watermarks 
from each frame, and Nk is noise. This test is performed for 
each wavelet frame to obtain X~k for all k. Similarity values 
are computed as before, Sk«Simk(X~k,W~k). 

Using the original video signal to detect the presence of 65 
a watermark, virtually all types of distortions can be 
handled, including cropping, rotation, resealing, etc., by 



employing a generalized likelihood ratio test. A second 
detection scheme which is capable of recovering a water- 
mark after many distortions without a generalized likelihood 
ratio test has also been developed. The procedure is fast and 
simple, particularly when confronted with the large amount 
of data associated with video. 

In the method for watermark detection without index 
knowledge, there is no knowledge of the indices of the test 
frames. Pirate tampering may lead to many types of derived 
videos which are often difficult to process. For example, a 
pirate may steal one frame from a video. A pirate may also 
create a video which is not the same length as the original 
video. Temporal cropping, frame dropping, and frame inter- 
polation are all examples. A pirate may also swap the order 
of the frames. Most of the better watermarking schemes 
currently available use different watermarks for different 
images. As such, they generally require knowledge of which 
frame was stolen. If they are unable to ascertain which frame 
was stolen, they are unable to determine which watermark 
was used. 

This method can extract the watermark without knowl- 
edge of where a frame belongs in the video sequence. No 
information regarding cropping, frame order, interpolated 
frames, etc., is required. As a result, no searching and 
correlation computations are required to locate the test frame 
index. The hypothesis test is formed by removing the low 
temporal wavelet frame from the test frame and computing 
the similarity with the watermark for the low temporal 
wavelet frame. The hypothesis test is formed as 

HO: Xk«Rk-F~0-Nk (no watermark) 

HI: Xk«Rk-F~0=W~*k+Nk (watermark) 
where Rk is the test frame in the spatial domain and F~0 is 
the lowest temporal wavelet frame. The hypothesis decision 
is made by computing the scalar similarity between each 
extracted signal Xk and original watermark for the low 
temporal wavelet frame W~0: Simk(Xk, W~0). This simple 
yet powerful approach exploits the wavelet property of 
varying temporal support. 

Although specific embodiments have been illustrated and 
described herein, it will be appreciated by those of ordinary 
skill in the art that any arrangement which is calculated to 
achieve the same purpose may be substituted for the specific 
embodiments shown. This application is intended to cover 
any adaptations or variations of the present invention. 
Therefore, it is manifestly intended that this invention be 
limited only by the following claims and equivalents 
thereof. 

I claim: 

1. A computerized method for embedding data represent- 
ing a watermark into host data relating to video: 

generating the data representing the watermark; 

subjecting the host data to a temporal wavelet transform; 

embedding the data into the host data, as subjected to the 
temporal wavelet transform, in accordance with a per- 
ceptual mask conducted in the frequency domain; and, 

subjecting the host data, including the data embedded 
therein, to a non-frequency mask. 

2. The computerized method of claim 1, wherein the data 
representing the watermark comprises a pseudo-random 
sequence. 

3. The computerized method of claim 1, wherein gener- 
ating the data representing the watermark uses a pseudo- 
random generator and two keys to generate the data. 

4. The computerized method of claim 3, wherein the 
pseudo-random generator is selected from the group com- 
prising RSA, Rabin, Blum/Micali, and Blum/Blum/Shub. 

5. The computerized method of claim 1, wherein the 
perceptual mask comprises a model in which a contrast 
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threshold at a frequency f is expressed as a function of the 
frequency f, a masking frequency f m and a masking contrast 

c(f, U=c 0 (f)-Max{l, [WJcJT}, 

where c 0 (f) is a detection threshold at the frequency f. 

6. The computerized method of claim 1, wherein the 
non-frequency mask comprises a spatial mask. 

7. The computerized method of claim 1, wherein subject- 
ing the host data to a temporal wavelet transform results in 
a multiresolution temporal representation of the video hav- 
ing temporal lowpass frames and temporal highpass frames. 

8. A scene-based computerized method of watermarking 
host data relating to video comprising: 

segmenting the host data into a plurality of scenes, each 
scene having a plurality of frames; 

subjecting each frame of each scene to a temporal wavelet 
transform; and, 

embedding each frame of each scene, as has been sub- 
jected to the temporal wavelet transform, with a water- 
mark in accordance with perceptual and spatial char- 
acteristics of the frame. 

9. The scene-based computerized method of claim 8, 
wherein subjecting each frame of each scene to the temporal 
wavelet transform results in lowpass wavelet frames and 
highpass wavelet frames. 

10. The scene-based computerized method of claim 9, 
wherein watermarks embedded in lowpass wavelet frames 
are located throughout the scene in a temporal domain. 

11. The scene -based computerized method of claim 9, 
wherein watermarks embedded in highpass wavelet frames 
are localized temporally. 

12. A computerized system for watermarking host data 
relating to video and having a plurality of scenes, each scene 
having a plurality of frames, comprising: 

a processor; 

a computer-readable medium; 

computer-executable instructions executed by the proces- 
sor from the computer-readable medium comprising: 
applying a temporal wavelet transform to each frame; 
segmenting each frame of each scene into blocks; 
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applying a discrete cosine transform (DCT) to each 
block to generate a frequency block corresponding to 
the block; 

generating a perceptual mask for each frequency block; 
applying the DCT to a watermark for each frequency 
block; 

weighting the perceptual mask for each frequency 
block with the watermark for the frequency block to 
which the DCT has been applied to generate a 
frequency-shaped author block; 

applying an inverse DCT to each frequency-shaped 
author block to generate a time-domain block; 

generating a spatial mask for each block; 

weighting each time-domain block by a spatial mask to 
generate a watermark block; and, 

adding each block to a corresponding watermark block 
to generate a watermarked block. 

13. A computer-readable medium having a computer 
program stored thereon to cause a suitable equipped com- 
puter to perform a method comprising: 

applying a temporal wavelet transform to each frame; 
segmenting each frame of each scene into blocks; 
applying a discrete cosine transform (DCT) to each block 

to generate a frequency block corresponding to the 

block; 

generating a perceptual mask for each frequency block; 
applying the DCT to a watermark for each frequency 
block; 

weighting the perceptual mask for each frequency block 
with the watermark for the frequency block to which 
the DCT has been applied to generate a frequency- 
shaped author block; 

applying an inverse DCT to each frequency-shaped author 
block to generate a time -domain block; 

generating a spatial mask for each block; 

weighting each time-domain block by a spatial mask to 
generate a watermark block; and, 

adding each block to a corresponding watermark block to 
generate a watermarked block. 

14. The computer-readable medium of claim 13, wherein 
the computer-readable medium is a floppy disk. 
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